Acknowledgements
This data was gathered by Jake Daniels. It covers data collected on the SEO tag between 2018-01-01 and 2018-12-31 from Medium.com. This includes: the title, date of publication, claps generated, author, reading time, the url. Full text of the articles will be added in the final version.
There’s a total of 19152 articles in this dataset.
Here’s a look at what that looks like:
Time-series
This is what the article volume looks like over time.
Post Volume
Day-Of-Week Frequency
We can summarize articles by weekday. Later we can examine their effectiveness.
Weekly Topics
We can find the most relevant word of each week by using term-frequency-in-document-frequency or TF-IDF.
If you’re curious, here’s a table of the three most relevant phrases along with their keyword of that week. You can look for the highest clap averages and see what topics were important that week.
See how the phrases relate to the keyword of that week.
Explore this data. You can see how each phrase relates to the word of that week. Included is see the amount of geometric claps that were generated (a measure of success) and the volume of posts (for an adequete sample size).
Frequent Terms
Here are words and phrases that are most used in headlines. The phrases have been stemmed to best gather their relations.
Word and phrase counts give a good signal of what’s being discussed. However, we want to use data science to look further into effectiveness of these words. Next, we’ll start grouping words together and examining their relations.
Word Networks
Here is a network of the correlated words in the article headers.
We can look at these groupings and see clusters of topics
If we add another dimension, then we can see which of the networks are most effective for generating claps.
How about another dimension? The size of the circles now reflect the volume of that word.
Need help reading the final chart?
Each connected node is part of a topic. We depend on the colours to distinguish which are popular or unpopular.
Positive Trends
- Red is good, especially when it’s a larger node
- Networks with red in them represent topics that are popular
- Small red nodes will represent under-utilized topics
Negative Trends
- Blue is ineffective, especialy when it’s a larger node
- Blue nodes MAY have topics that have yet to be packaged correctly
- White is neutral, these words/topics are performing at an average rate
Topic Clusters
The networks above show relationships between words that create topics.
We’ll try to create topics by looking for clusters of words. These topics can typically be inferenced. We use unsupervised machine-learning to do this– that means we have no desired outcome for the computer to find, so it just digs for patterns that naturally occur in the dataset. It’s a simple way to get a feel for big trends in the data and what’s currently underway in the industry.
Here’s 5 clusters that hold 8 words to describe the topic:
This creates our topic clusters! Great for brainstorming content and knowing what’s commonly talked about. Let’s shorten the amount of words and increase the number of clusters.
Tweaking the numbers can form different topics.
It’s not too hard to figure out what each cluster can represent.
- Topic 1: Web Design Services
- Topic 2: Reasons to use Social Media for your Business
- Topic 3: Wordpress Tips for Blogging
- Topic 4: Local SEO
- Topic 5: Digital Marketing Training
- Topic 6: Increase your website traffic
- Topic 7: Free Tools and Guide
- Topic 8: SEO Rankings
Word Impact
Time for some interactivitiy.
Here are words that are impactful/overused. And words that are proven to bring claps.
The size is based on another measurement called geometric mean. It is often used when data is highly skewed. It can be a tie-breaker for clusters that are close together.
- Blue - topics that perform well when they are written about, which isn’t often
- Red - topics that are written about A LOT but don’t generate many claps
- Gold - strong topics to write about
- Dark Grey - average
And here’s a table of those terms. Stronger green adds credibility to the geometric mean being accurate.
| Word | Geometric Average | Occurences |
|---|---|---|
| content | 1.66 | 773 |
| write | 1.59 | 265 |
| blog | 1.12 | 641 |
| googl | 1.09 | 1337 |
| tool | 1.09 | 441 |
| start | 1.07 | 184 |
| guid | 1.05 | 434 |
| creat | 0.94 | 230 |
| post | 0.91 | 206 |
| organ | 0.88 | 210 |
| traffic | 0.87 | 613 |
| site | 0.84 | 572 |
| step | 0.84 | 318 |
| increas | 0.78 | 321 |
| search | 0.78 | 1350 |
| rank | 0.77 | 795 |
| build | 0.76 | 417 |
| trend | 0.76 | 207 |
| boost | 0.75 | 297 |
| keyword | 0.75 | 408 |
| reason | 0.75 | 304 |
| strategi | 0.73 | 640 |
| wordpress | 0.73 | 324 |
| link | 0.72 | 398 |
| page | 0.72 | 553 |